Multi-gas pollutant detection based on sparrow search algorithm optimized ALSTM-FCN

It is critical to identify and detect hazardous, flammable, explosive, and poisonous gases in the realms of industrial production and medical diagnostics. To detect and categorize a range of common hazardous gasses, we propose an attention-based Long Short term memory Full Convolutional network (ALSTM-FCN) in this paper. We adjust the network parameters of ALSTM-FCN using the Sparrow search algorithm (SSA) based on this, by comparison, SSA outperforms Particle Swarm Optimization (PSO) Algorithm, Genetic Algorithm (GA), Gray Wolf Optimization (GWO) Algorithm, Cuckoo Search (CS) Algorithm and other traditional optimization algorithms. We evaluate the model using University of California-Irvine (UCI) datasets and compare it with LSTM and FCN. The findings indicate that the ALSTM-FCN hybrid model has a better reliability test accuracy of 99.461% than both LSTM (89.471%) and FCN (96.083%). Furthermore, AdaBoost, logistic regression (LR), extra tree (ET), decision tree (DT), random forest (RF), K-nearest neighbor (KNN) and other models were trained. The suggested approach outperforms the conventional machine learning model in terms of gas categorization accuracy, according to experimental data. The findings indicate a potential for a broad range of polluting gas detection using the suggested ALSTM-FCN model, which is based on SSA optimization.


Introduction
With the rapid development of industry and the improvement of people's living standards, polluting gases from farms, vehicles, factories and other sources have a negative impact on air quality, which in turn contributes to climate deterioration [1].There are many harmful gases, such as ammonia, ethylene, benzene, etc., which may directly cause harm to people's health.The WHO states that air pollution is the leading cause of death for individuals with cardiovascular disease, including stroke, cancer, and chronic respiratory conditions, and that it may exacerbate asthma, particularly in children [2], that more than 32% of deaths are caused by air pollution, and that approximately 6.7 million people die prematurely each year due to air pollution, adversely affecting overall well-being, the climate and the economy [3].Thus, it is 1.The ALSTM-FCN model was constructed to classify the common polluting gases, and the model parameters were optimized by SSA algorithm to improve the performance of the model.

2.
Experiments demonstrate the importance of using SSA algorithm to optimize model parameters, and the performance of the model is evaluated on UCI datasets.
3. The accuracy, precision, recall rate, F1 score, and computation cost of the ALSTM-FCN model are thoroughly examined, and the model's efficacy is further confirmed by contrasting it with the LSTM, FCN, DT, and other traditional models.
The organizational structure of this paper is as follows.The second part introduces the literature review of the algorithm model in electronic nose system.The third part introduces the network mode and dataset.The fourth part shows the evaluation indexes and experimental results of the algorithm model.In the fifth part, the experimental results are discussed.Finally, the sixth part summarizes and introduces the future work direction.

Literature review
The electronic nose consists of a pattern recognition algorithm and a sensor array.Using multiple sensor arrays is more effective than using a single sensor to identify gas [12].There are several different kinds of gas sensors, the most often used being metal oxide semiconductor (MOS) sensors [13].The electronic nose's identification performance may be effectively enhanced by using an appropriate pattern recognition algorithm.Popular algorithms for pattern recognition include PCA, LDA, SVM, and CA [14][15][16][17].Machine learning algorithms are capable of achieving good results on tiny dataset because of their flexibility.Some of these are still widely used today because of their accessibility, usability, and low cost.However, the data that gas sensor arrays gather are often high-dimensional data, and it is challenging for the conventional pattern recognition algorithm to identify the intricate nonlinear connection between the data.As a consequence, processing big dataset is not sufficient.The field of deep learning has advanced quickly in recent years, artificial neural network (ANN) provides an opportunity for electronic nose to process high-dimensional sensor data, and its performance is significantly higher than that of traditional pattern recognition algorithms [18,19].Recently, researchers have combined deep learning algorithms with sensor arrays and achieved good gas detection results.Lin et al. used lightweight residual CNN to identify soybeans with different origins with success [20].Gamboa J et al. used CNN and SVM to identify target gasses [21].Cai et al. used CNN-LSTM-AM model to carry out on-line detection of multi-gas in mud logging process [22].To categorize liquor gas information, Hou et al. suggested a method for identifying binary codes using triangle differences [23].
The number of hyperparameters in a network model increases along with its complexity, thus selecting the right ones is essential to improving performance.Grid search algorithm is a common method to set model parameters, but it can be very time-consuming if there are many parameters.Therefore, scholars have proposed some other optimization algorithms, such as genetic (GA) algorithms, particle swarm optimization (PSO) algorithms, grey wolf optimization (GWO) algorithm, etc [24].Among them, In the later stages, the convergence speed of PSO and GWO is sluggish, and it is easy to slip into the local optimum, and the efficiency of GA is low.The sparrow search algorithm that Xue and Shen suggested in 2020 is the one used in this study [25].Compared with other optimization algorithms, because of its powerful optimization capabilities and quick convergence time, SSA has garnered a lot of interest.The distributed generations optimal configuration model was solved by Wang et al. using SSA, and the effectiveness and superiority of SSA were validated by experimental simulation [26].
Key information to increase classification performance may be found in the depth information that a neural network learns from sensor data.However, it also contains extraneous data to worsen classification accuracy.To solve this problem, researchers have found that combining attention mechanisms with neural networks can effectively enhance the network model's ability to learn important information.Currently, image recognition and other domains make extensive use of the attention mechanism [27].Among the primary attention methods are squeeze-and-excitation (SE) [28], efficient channel attention (ECA) [29], convolutional block attention module (CBAM) [30], coordinate attention (CA) [31], etc.In recent years, many researches were carried out that combined attention mechanism with neural network.Yan et al. classified rice gas information using a channel-space cooperative attention technique [11].Zhang and colleagues classified gas information of various spirits using a channel-space cooperative attention approach [32].Dynamic attention processes were used by Men et al. to determine the rice quality at various storage times [33].

Methods
The dataset utilized for the simulation tests in this work came from the UCI.This dataset contains a variety of hazardous gases commonly found in industrial production.Each and every algorithm was developed using the Python 3.6 integrated development environment (PyCharm 2023.1.1,Community Edition) to guarantee consistency in the training and testing of the models.All programs were performed on Windows 10 (x64) operating system (GPU: NVIDIA GTX 1080Ti, RAM: 32G).GPU parallel compute is used to accelerate the deep learning training process.TensorFlow was used as the primary computational tool.Here, the SSA, LSTM, FCN and attention mechanism are first reviewed.

Sparrow search algorithm (SSA)
A novel population intelligence optimization technique called the Sparrow search algorithm was put out by XUE J in 2020 [25].This method offers the benefits of great optimization ability and rapid convergence speed over other optimization algorithms.The anti-predation and foraging strategies of sparrows serve as an inspiration for SSA.Typically, sparrows are split into producers and followers in order to finish the search.Since producers have greater resources, it is their responsibility to identify areas that are abundant in food and provide instruction to followers.When the producer spots an opponent, he directs his people in other directions.While producers and followers undergo identity changes during iterations, the overall proportion of producers to followers does not vary.The producer's location has been revised as follows: The maximum number of repetitions is denoted by K, and the random integer L has a typical normal distribution.In the interval of (0, 1], α is a uniformly random number.K indicates a matrix of 1*d, with each element being 1.The ith sparrow's location in the jth dimension during the tth iteration is indicated by X tþ1 i;j .A population of sparrows has faced danger and has to take action if the warning value (R 2 ) is achieved (R 2 2 [0, 1]).When the number of sparrows is within the safety limit (SN 2 [0.5, 1]), it may travel properly.This is known as the safety value.
Scrounger Location Updates: The best producer at iteration t+1 is indicated by the letter X tþ1 P .X t worst represents the worst position at the tth iteration; L is a random integer that follows the conventional normal distribution; and M stands for a 1*d matrix with 1 as each entry.A is a 1*d matrix where each element has a random assignment of either 1 or −1.
Location updates for early-warning agents: X t best represents the present global ideal location, and β is a random variable that follows a normal distribution.The movement path and step control parameters of the sparrow are indicated by the letter P, and P 2 (−1, 1).The current adaption value for a single sparrow is f i .At now, f w represents the worldwide worst adaptation value, while f g represents the global ideal value.The tiny constant ε keeps the denominator from becoming zero.

Long short-term memory recurrent neural network (LSTM-RNN)
The LSTM neural network, a kind of recurrent neural network (RNN), was first presented by Hochreiter and Schmidhuber in 1997 [34].Unlike feedforward neural networks, long-term input analysis is possible with LSTM.Using LSTM, information in a lengthy time series may be effectively described and transmitted without losing track of crucial historical information.Concurrently, LSTM can tackle the gradient disappearance/explosion issue of RNN.Fig 1 illustrates the forgetting gate, input gate, and output gate that comprise the memory cell unit and allow for selective information flow [35].While maintaining important information in the memory cell state, the forget gate has the ability to selectively forget certain information.Incoming data flow management and selective memory cell state data storage are two of the input gate's duties.All data produced from the memory cell is guaranteed to be in sync with the current time by the output gate.These three gate designs perform memory cell operations, such matrix multiplication and nonlinear summation, to keep memory from being lost during computation iterations.
Typically, the LSTM unit performs the following computations: The weights of the input data for each gate are represented by G i , G o , G f and G c .The preceding state's common recurrent weights are represented by L i , L o , L f and L c .The bias term is denoted by b i , b o , b f and b c , while the sigmoid activation function is expressed as σ.The following formula is used to determine the state to be preserved at the current time step.Lastly, the concealed state that exists right now is displayed as follows: Fully convolutional network (FCN) CNN has shown its efficacy in addressing the temporal classification issue [36].Upon input of the data, the convolutional kernel performs a sliding convolution at a designated stride length in order to extract the characteristics of the data.A CNN variation called FCN is able to address CNN's overfitting and gradient explosion issues.FCN employs a global average pool layer (GAP) in place of the conventional final full connection (FC) layer and switches from CNN's full connection layer to the convolutional layer without a local pooling layer [37].Convolutional layers are the fundamental building blocks of FCN, and each one is capable of applying nonlinear changes to the input time series.After the convolutional layer, the gathered features are routed to the global average pooling layer, which is followed by the batch normalization (BN) layer and the rectifying linear unit (ReLU) activation layer.The softmax layer, which creates category tags, is the last layer to be linked.

Attention mechanism
As a result of its little memory, LSTM's performance may quickly decline when handling longterm dependencies in a lengthy sequence.This problem can be solved through the attention mechanism, the concept of attention proposed by Bahdanau et al [38].The attention mechanism is inspired by the visual system of animals, which allows animals to focus on specific observation objects.Applying the attention mechanism to the LSTM can improve the robustness of the model by making the network focus on those features that are relevant to the output and ignore those that interfere with the information.The output element c i is determined by a input sequence (u 1 , u 2 , . .., u n ), where n represents the maximum length of the input sequence.
Each annotation u i contains information for the entire input sequence, focusing on the elements around the i th element in the input sequence, c i can be represented by the Formula 10.
Where, G ij represents the weight of each annotations u i , and the weight is calculated as follows: u i and u j represent the i th and j th annotations, and T represents the transpose of the vector.

ALSTM-FCN
In the proposed attention-based LSTM-FCN model, the input data is routed across two parallel networks: LSTM and FCN.The output from the two networks is concatenated and categorized using a softmax layer.The gas classification flow chart based on ALSTM-FCN is shown in Fig 2 .LSTM and FCN adopt parallel structure to extract features from data respectively, and finally merge them into one vector.In addition, The FCN contains three time convolution modules, and the convolution block contains one convolution layer, which has multiple filters (128,256,128), and the kernel sizes are 8,5,3.Each convolution layer is succeeded by batch normalization, and the batch normalization layer is succeeded by the ReLU activation function.The squeeze and excitation is introduced behind every two convolution blocks.Extrusion and excitation module is proposed by Hu et al [28].This method measures the importance of each feature channel and inhibits those features with less influence, thus improving the classification effect of the model.1.

Parameter optimization
In machine learning and deep learning, hyperparameter selection is an important part of model training.Choosing an optimization strategy that automatically adjusts hyperparameters is crucial if you want to increase the model's classification accuracy.Hyperparameters refer to parameters that need to be determined before the model is trained, such as the learning rate, the maximum depth of the decision tree, the number of layers of the neural network, etc.In this study, SSA is selected to optimize the Learning rate, Dropout and Batch size of the model.The parameters of SSA are as follows: sparrow population is 50, the maximum number of iterations is 40, the proportion of discoverers is 0.2, the proportion of watchers is 0.1, alarm value is 0.8, popmax = 5, popmin = −5.The range of the SSA to search the learning rate parameter is [0.0001, 0.01], the range of the search Dropout is [0.1,1], and the range of the search Batch size is [10,200].The optimized parameters are as follows: the learning rate is 0.0005, the value of Dropout is 0.8 and the value of Batch size is 128.In order to further verify the rationality of SSA algorithm, the performance improvement effects of a single ALSTM-FCN model, particle swarm algorithm, and five optimization algorithms on ALSTM-FCN models (PSO, GA, GWO, CS, and SSA) were compared separately.To guarantee the generalization performance and rationality of model Classification results, all models are averaged from the results of 10 runs.It can be found that the accuracy rate of the ALSTM-FCN model optimized by algorithm is higher than that of a single ALSTM-FCN model.There are slight differences between the results of the training set and the test set of the SSA-ALSTM-FCN model, and the optimization of the ALSTM-FCN model has a better degree of generalization than other algorithms, so SSA is more suitable for establishing accurate gas classification models.

Data preparation
The data collection "Gas Sensor Array Drift Dataset at Different Concentrations" from the UCI is used in this work, the access link is http://archive.ics.uci.edu/ml/datasets/Gas+Sensor+Array+Drift+Dataset+at+Different+Concentrations.The dataset includes four distinct kinds of gas sensors: TGS-2600, TGS-2602, TGS-2610, and TGS-2620 and there are four sensors of each kind.These gas sensors have different sensitivity levels based on the kind of gas.Table 2 lists the selectivity of gas sensors.
The sensor array was housed in a sealed 60 mL box for the experiment.A steady 200 mL/ min flow rate was used to inject the gas sample.At a sample frequency of 100 Hz, the conductivity (S/m) of the sensor is continually gathered.Every measurement is created by continually collecting data from a set of sixteen sensors.The gas sample contains six different concentrations of gases: ammonia, acetaldehyde, acetone, ethylene, ethanol and toluene, corresponding to the concentration intervals of (50,1000), (5,500), (12,1000), (10,300), (10,600) and (10,100) ppm.The dataset contains 13910 gas samples, each with a length of 128.
For convenience, the six gases are numbered from 1 to 6.The specific numbering information and model classification label are shown in Table 3.A common method for visualizing multi-dimensional data from gas sensor arrays is principal component analysis (PCA) [39], the quantity and distribution of gas samples in the dataset are shown in Fig 4, and more details about the dataset can be found in [40].

The Whole experimental process
The experiment described in this article was carried out in the following manner: The data is first preprocessed and analyzed, and then it is split into training and test sets.Next, the ALSTM-FCN model is created, and the processed data is fed into the model to be trained.During the training phase, the model's parameters are optimized using the SSA.Finally, the test set is used to evaluate the trained model's performance.

Data processing
Because sensor response sizes vary widely, normalization helps prepare appropriate data for the classification model's input space and prevent prediction mistakes caused by significant discrepancies across sensor outputs.Within the experiment, the data is normalized to a range of 0 to 1, which increases the classification model's speed and accuracy.
Here, x i represents the initial data, the response data's lowest value is represented by x min , while its highest value is represented by x max .
Using singular value decomposition to break down a matrix into a collection of uncorrelated variables known as principle components, PCA is a popular technique for processing high-dimensional feature data.It eliminates noise and unwanted characteristics from the data while keeping the most significant features, thereby improving the speed of data processing  https://doi.org/10.1371/journal.pone.0310101.g004[41].As expected, in the first few major directions, the Relative Information Content (RIC) value rapidly approaches its maximum possible value of 1.00, as shown in Fig 6.
Actually, 99.5% of the RIC value may be produced using only the first 24 major directions.In order to create the low-dimensional space, the basis vectors for the first 24 directions' unit vectors are chosen.Currently, a 13910 * 24-dimensional feature matrix has been created by integrating the data.Following the reduction of the data using PCA, the samples were split into training and test sets in an 8:2 ratio.

Gas classification identification
We initially contrasted ALSTM-FCN with the LSTM and FCN models in order to assess the viability of the suggested approach.Learning curve is an important index algorithm for We evaluated the classification results of six gases using three indicators: precision (P), recall (R), and F1 score [42], in order to more accurately represent the three models' capacity Where, in that order, TP, FP, TN, and FN stand for true positive, false positive, true negative, and false negative, respectively.As shown in Fig 8, in three indices, the ALSTM-FCN model presented in this research outperforms the LSTM and FCN models, especially the classification effect of acetaldehyde is significantly improved.
We evaluated the performance of the ALSTM-FCN with a number of other models, including Ada Boost, K-nearest neighbor (KNN), random forest (RF), logistic regression (LR), extra tree (ET), and decision tree (DT), in order to further illustrate the efficacy of the model, all models were run 10 times, and the experimental results were shown in Table 4.We found that the proposed network has the highest accuracy of 99.461% and the lowest loss of 0.028, which is superior to other commonly used classification models.However, with the increase of network model complexity, the training time of ALSTM-FCN model is the longest (431 s), but all other indexes are better than other classification models, so this time consumption is acceptable.

Discussion
Electronic nose system has been widely used in the field of environmental engineering [43], and different algorithm models will have a great impact on the performance of electronic nose.In this study, the application of ALSTM-FCN model in electronic nose system is proposed for the first time to accurately classify a variety of common harmful gases.Previous studies have proved that using optimization algorithms to optimize the parameters of the model can significantly improve the performance of the model.For example, Sharma et al. used the CNN model optimized by GWO algorithm to detect sugarcane diseases [44], Zhang et al. used PSO algorithm to optimize SVM and successfully distinguished wheat grades [45].In this study, the SSA algorithm proposed in recent years was also used to optimize the hyperparameters of the model, as shown in Fig 3 .After optimizing the parameters of the model by using optimization algorithms, the classification accuracy can indeed be improved.In addition, SSA can effectively solve problems trapped in local optimal solutions, which is superior to classical optimization algorithms such as PSO and GWO.
Previous studies often used a single model for data processing, for example, Zhang et al. used the LSTM model for gas leakage detection [46], Shin et al. used the FCN model to monitor indoor air quality [47].In contrast, the fusion model proposed in this study has obvious advantages in extracting important features of data.As shown in Fig 7, the ALSTM-FCN model has the best classification performance, and the convergence speed is higher than that of the single model (LSTM and FCN).It may be because FCN makes up for LSTM's lack of learning ability for high-dimensional features.
In addition, when the gas sensor performs gas acquisition for a long time, the baseline drift phenomenon will occur [48].In previous studies, Se et al. constructed a sensor drift compensation framework to realize multi-tasking of sensor drift and gas classification [49], Oh et al. added a drift compensation module to the classification model to improve the accuracy of gas classification [50].While the ALSTM-FCN model proposed in this study can accurately capture the characteristics of various gas data without compensating the drift of the dataset, and the gas classification result reaches 99.461%.Compared with the classical classification algorithms such as DT and RF, in addition to the longer training time of the model, the other indexes get the best results.

Conclusions and future works
In this paper, a multi-gas classification model of LSTM-FCN based on attention mechanism is proposed, which is used to detect and classify many common toxic and harmful gases in industrial production.The model models time dependency using LSTM and extracts advanced features using FCN.By combining the attention mechanism with the model, important information can be captured and given higher weights, thus improving classification accuracy.In order to further improve the performance of the model, we compared SSA, PSO, GWO and other optimization algorithms, and the results show that SSA algorithm has better generalization.We trained and tested the model on UCI datasets, after SSA optimization, the average recognition accuracy of the model for all gases reached 99.461%.We compared ALSTM-FCN model with LSTM, FCN, KNN and other classical classification models, and the results show that ALSTM-FCN model has the highest average accuracy (98.148%).This work has some potential application value and offers valuable recommendations for the sensor array-based detection of hazardous Agasses.Although the ALSTM-FCN model proposed in this paper for a variety of toxic gases can achieve a good classification effect, due to the complexity of the proposed model, the network parameters are more than those of classical machine learning models, so there is still room for improvement in training time cost.In our future work, We will further improve the data preprocessing method through more extensive experiments to improve the time efficiency of training the model.In addition, we hope that the proposed method can be applied to different applications and different gas sensor systems to detect more harmful gases in production and life, so as to further evaluate their classification performance.

Fig 1 .
Fig 1. Architecture of a LSTM.https://doi.org/10.1371/journal.pone.0310101.g001 Fig 2 provides an overview of the steps involved in computing the squeeze-and-excite block in our design.The construction and operation of the suggested model on the Python platform are shown in Table

Fig 4 .
Fig 4. (a) The number of six gas samples in the dataset; (b) Data distribution of gas samples.